Supplementary Material: Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS

نویسندگان

  • I. Z. Reguly
  • G. R. Mudalige
  • M. B. Giles
چکیده

if(dir == g_xdir) { if(sweep_number == 1) { ops_par_loop(advec_cell_kernel1_xdir, "advec_cell_kernel1_xdir", clover_grid, 2, rangexy, ops_arg_dat(work_array1, 1, S2D_00, "double", OPS_WRITE), ops_arg_dat(work_array2, 1, S2D_00, "double", OPS_WRITE), ops_arg_dat(volume, 1, S2D_00, "double", OPS_READ), ops_arg_dat(vol_flux_x, 1, S2D_00_P10, "double", OPS_READ), ops_arg_dat(vol_flux_y, 1, S2D_00_0P1, "double", OPS_READ)); } else { ops_par_loop(advec_cell_kernel2_xdir, "advec_cell_kernel2_xdir", clover_grid, 2, rangexy, ops_arg_dat(work_array1, 1, S2D_00, "double", OPS_WRITE), ops_arg_dat(work_array2, 1, S2D_00, "double", OPS_WRITE), ops_arg_dat(volume, 1, S2D_00, "double", OPS_READ), ops_arg_dat(vol_flux_x, 1, S2D_00_P10, "double", OPS_READ)); } ops_par_loop(advec_cell_kernel3_xdir, "advec_cell_kernel3_xdir", clover_grid, 2, rangexy_inner_plus2x, ops_arg_dat(vol_flux_x, 1, S2D_00, "double", OPS_READ), ops_arg_dat(work_array1, 1, S2D_00_M10, "double", OPS_READ), ops_arg_dat(xx, 1, S2D_00_P10_STRID2D_X, "int", OPS_READ), ops_arg_dat(vertexdx, 1, S2D_00_P10_M10_STX, "double", OPS_READ), ops_arg_dat(density1, 1, S2D_00_P10_M10_M20, "double", OPS_READ), ops_arg_dat(energy1, 1, S2D_00_P10_M10_M20, "double", OPS_READ), ops_arg_dat(mass_flux_x, 1, S2D_00, "double", OPS_WRITE), ops_arg_dat(work_array7, 1, S2D_00, "double", OPS_WRITE)); ...

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Writing productive stencil codes with overlapped tiling ‡ 3

Stencil computations constitute the kernel of many scientific applications. Tiling is often used to improve 11 the performance of stencil codes for data locality and parallelism. However, tiled stencil codes typically require shadow regions, whose management becomes a burden to programmers. In fact, it is often the 13 case that the code required to manage these regions, and in particular their ...

متن کامل

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Diamond Tiling: A Tiling Framework for Time-iterated Scientific Applications

This paper fully develops Diamond Tiling, a technique to partition the computations of stencil applications such as FDTD. The Diamond Tiling technique is the result of optimizing the amount of useful computations that can be executed when a region of memory is loaded to the local memory of a multiprocessor chip. Diamond Tiling contributes to the state of the art on time tiling techniques in tha...

متن کامل

A Comparison of Compiler Tiling Algorithms

Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to tiling have been suggested for avoiding connict misses in low associativity caches. We propose a new technique based on intra-variable padding and compare its performance with existing techniques. Results show padding improves performance of matrix multiply by over 100% in some...

متن کامل

On the Parallel Execution Time of Tiled Loops

Many computationally-intensive programs, such as those for differential equations, spatial interpolation, and dynamic programming, spend a large portion of their execution time in multiply-nested loops that have a regular stencil of data dependences. Tiling is a well-known compiler optimization that improves performance on such loops, particularly for computers with a multileveled hierarchy of ...

متن کامل

An Auto-tuning Jit Compiler for Accelerating Multiple Stencil Computations

We present a JIT compiler with auto-tuning capabilities fusing multiple stencil computations. Data arrays for scientific computing of image processing often exceed cache-memory size. To take advantage of spatial and temporal locality, a common method is to partition the images into tiling blocks for multicore architectures. In realistic scenarios, the multiple image algorithms, most of which ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017